Mining the Web for Association Similarity between Concepts
نویسندگان
چکیده
Measures of similarity between two terms or concepts have been widely used in the domain of Natural Language Processing, Semantic Web, and so on. There are mainly two kinds of methods for measuring similarity. One is based on prior manually built taxonomy or Ontology; the other which is usually referred to as the statistical approaches is based on the corpus. However, the ontology-based method has problem of coverage and the corpus-based method has the problem of sparse data. In order to overcome these problems, a huge data source World Wide Web was used to calculate similarity between concepts. The concept similarity was measured using the association rule mining in the snippets returned from Web search engines. The most influential algorithm for association rule mining is Apriori. In order to improve the efficiency of Apriori algorithm and use it to measure the concept similarity, there are three main improvements in Apriori algorithm. The experimental result shows that the algorithm can improve the precise of measuring concept similarity.
منابع مشابه
Similarity measurement for describe user images in social media
Online social networks like Instagram are places for communication. Also, these media produce rich metadata which are useful for further analysis in many fields including health and cognitive science. Many researchers are using these metadata like hashtags, images, etc. to detect patterns of user activities. However, there are several serious ambiguities like how much reliable are these informa...
متن کاملخوشهبندی اسناد مبتنی بر آنتولوژی و رویکرد فازی
Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...
متن کاملUse of Semantic Similarity and Web Usage Mining to Alleviate the Drawbacks of User-Based Collaborative Filtering Recommender Systems
One of the most famous methods for recommendation is user-based Collaborative Filtering (CF). This system compares active user’s items rating with historical rating records of other users to find similar users and recommending items which seems interesting to these similar users and have not been rated by the active user. As a way of computing recommendations, the ultimate goal of the user-ba...
متن کاملAutomatic Discovery of Technology Networks for Industrial-Scale R&D IT Projects via Data Mining
Industrial-Scale R&D IT Projects depend on many sub-technologies which need to be understood and have their risks analysed before the project can begin for their success. When planning such an industrial-scale project, the list of technologies and the associations of these technologies with each other is often complex and form a network. Discovery of this network of technologies is time consumi...
متن کاملPrediction of user's trustworthiness in web-based social networks via text mining
In Social networks, users need a proper estimation of trust in others to be able to initialize reliable relationships. Some trust evaluation mechanisms have been offered, which use direct ratings to calculate or propagate trust values. However, in some web-based social networks where users only have binary relationships, there is no direct rating available. Therefore, a new method is required t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- JSW
دوره 7 شماره
صفحات -
تاریخ انتشار 2012